Exploring Novel Many-core Architectures for Scientific Computing
نویسندگان
چکیده
The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of this dissertation is to study programming issues for many-core architectures, and the contributions of this dissertation are in two main areas. Optimizing the Fast Fourier Transform for IBM Cyclops-64 To understand issues in designing and developing high-performance algorithms for many-core architectures, we use the fast Fourier transform (FFT) as a case study to investigate the above issues on the IBM Cyclops-64 many-core chip architecture. We analyze the optimization challenges and opportunities for FFT problems, and identify domain-specific features of the target problems and match them well with some key many-core architecture features. We quantitatively address the impacts of various optimization techniques and effectiveness of the target architecture. The resulting FFT implementations achieve excellent performance results in terms of both speedup and absolute performance. To assist the algorithm design and performance analysis, we present a model that estimates the performance of parallel FFT algorithms for an abstract many-core architecture. This abstract architecture captures generic features and parameters of several real many-core architectures; therefore the performance model is applicable for any architecture with similar features. We derive the performance model based on cost functions for three main components of an execution: the memory accesses, the computation, and the synchronization. The
منابع مشابه
Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems
Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...
متن کاملLightSpeed: A Many-core Scheduling Algorithm
The world is heading towards many-core architectures due to many well-known and important present-day research issues: power consumption, clock speed limits, critical path lengths, etc. While existing many-core machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level which translates to n...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملHigh Performance Architectures for OMP Compressive Sensing Reconstruction Algorithm
Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. The signal reconstruction techniques are computationally intensive and power consuming, which makes them impractical for embedded applications . The paper presents novel architectures for Orthogonal Matching Pursuit algorithm, one of the popular CS r...
متن کامل